Predictive models of gene regulation

نویسنده

  • Anshul Bharat Kundaje
چکیده

Predictive models of gene regulation Anshul Bharat Kundaje The regulation of gene expression plays a central role in the development and function of a living cell. A complex network of interacting regulatory proteins bind specific sequence elements in the genome to control the amount and timing of gene expression. The abundance of genome-scale datasets from different organisms provides an opportunity to accelerate our understanding of the mechanisms of gene regulation. Developing computational tools to infer gene regulation programs from high-throughput genomic data is one of the central problems in computational biology. In this thesis, we present a new predictive modeling framework for studying gene regulation. We formulate the problem of learning regulatory programs as a binary classification task: to accurately predict the the condition-specific activation (up-regulation) and repression (down-regulation) of gene expression. The gene expression response is measured by microarray expression data. Genes are represented by various genomic regulatory sequence features. Experimental conditions are represented by the gene expression levels of various regulatory proteins. We use this combination of features to learn a prediction function for the regulatory response of genes under different experimental conditions. The core computational approach is based on boosting. Boosting algorithms allow us to learn high-accuracy, large-margin classifiers and avoid overfitting. We describe three applications of our framework to study gene regulation: • In the GeneClass algorithm, we use a compendium of known transcription factor binding sites and gene expression data to learn a global context-specific regulation program that accurately predicts differential expression. GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. We introduce a novel robust variant of boosting that improves stability and biological interpretability in the presence of correlated features. We also show how to incorporate genome-wide protein-DNA binding data from ChIP-chip experiments into the framework. • In several organisms, the DNA binding sites of many transcription factors are unknown. Hence, automatic discovery of regulatory sequence motifs is required. In the MEDUSA algorithm, we integrate raw promoter sequence data and gene expression data to simultaneously discover cis regulatory motifs ab initio and learn predictive regulatory programs. MEDUSA automatically learns probabilistic representations of motifs and their corresponding target genes. We show that we are able to accurately learn the binding sites of most known transcription factors in yeast. • We also design new techniques for extracting biologically and statistically significant information from the learned regulatory models. We use a margin-based score to extract global condition-specific regulomes as well as cluster-specific and gene-specific regulation programs. We develop a post-processing framework for interpreting and visualizing biological information encapsulated in our models. We show the utility of our framework in analyzing several interesting biological contexts (environmental stress responses, DNA-damage response and hypoxia-response) in the budding yeast Saccharomyces cerevisiae. We also show that our methods can learn regulatory programs and cis regulatory motifs in higher eukaryotes such as worms and humans. Several hypotheses generated by our methods are validated by our collaborators using biochemical experiments. Experimental results demonstrate that our framework is quantitatively and qualitatively predictive. We are able to achieve high prediction accuracy on test data and also generate specific, testable hypotheses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation of Dynamic Modulus Predictive Models for Asphalt Mixtures

Dynamic modulus characterizes the viscoelastic behavior of asphalt materials and is the most important input parameter for design and rehabilitation of flexible pavements using Mechanistic–Empirical Pavement Design Guide (MEPDG). Laboratory determination of dynamic modulus is very expensive and time consuming. To overcome this challenge, several predictive models were developed to determine dyn...

متن کامل

پیش بینی سازگاری تحصیلی براساس سبک های یاد گیری و خود تنظیمی تحصیلی

The aim of this study is to investigate the role of learning style and academic self-regulation in order to predict academic adjustment among students. In this project, 250 female students of Shiraz University have participated. These students have chosen by the stratified cluster sampling method. They answered to the self-regulation Bouffurd (1995), Kolb learning style (1985), academic adjustm...

متن کامل

Roles of Chromatin insulators in gene regulation and diseases

With advances in genetic science, the dynamic structure of eukaryotic genome is considered as basis of gene expression regulation. Long-distance communication between regulatory elements and target promoters is critical and the mechanisms responsible for this connection are just starting to emerge. Chromatin insulators are key determinants of proper gene regulation and precise organization of c...

متن کامل

Down-Regulation of the ALS3 Gene as a Consequent Effect of RNA-Mediated Silencing of the EFG1 Gene in Candida albicans

Background: The most important virulence factor which plays a central role in Candida albicans pathogenesis is the ability of this yeast to alternate between unicellular yeast and filamentous hyphal forms. Efg1 protein is thought to be the main positive regulating transcription factor, which is responsible for regulating hyphal-specific gene expression under most conditions. ALS3 is one of the ...

متن کامل

Gene Regulation Network Based Analysis Associated with TGF-beta Stimulation in Lung Adenocarcinoma Cells

Background: Transforming growth factor (TGF)-β is over-expressed in a wide variety of cancers such as lung adenocarcinoma. TGF-β plays a major role in cancer progression through regulating cancer cell proliferation and remodeling of the tumor micro-environment. However, it is still a great challenge to explain the phenotypic effects caused by TGF-β stimulation and the effect of TGF-β stimulatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008